Chromatin analysis of adult pluripotent stem cells reveals a unique stemness maintenance strategy

Many highly regenerative organisms maintain adult pluripotent stem cells throughout their life, but how the long-term maintenance of pluripotency is accomplished is unclear. To decipher the regulatory logic of adult pluripotent stem cells, we analyzed the chromatin organization of stem cell genes in the planarian Schmidtea mediterranea. We identify a special chromatin state of stem cell genes, which is distinct from that of tissue-specific genes and resembles constitutive genes. Where tissue-specific promoters have detectable transcription factor binding sites, the promoters of stem cell–specific genes instead have sequence features that broadly decrease nucleosome binding affinity. This genic organization makes pluripotency-related gene expression the default state in these cells, which is maintained by the activity of chromatin remodelers ISWI and SNF2 in the stem cells.

The PDF file includes: Figs. S1 to S15 Legends for tables S1 to S3, S6 and S7 Tables S4 and S5 Other Supplementary Material for this manuscript includes the following: Tables S1 to S3, S6 and S7 showing the expression (log2(TPM)) of the constitutive genes in the RNA-seq data from the different tissues (brain, epidermis, intestine, neoblast).B. Boxplots of the log2(ATAC_RPKM) over the TSS region (± 1kb) of the constitutive genes in the different cell isolations (brain, epidermis, intestine, neoblast).C. Boxplots of the log2(H3K4me3_RPKM) over the TSS region (± 1kb) of the constitutive genes in the different cell isolations (brain, epidermis, intestine, neoblast).D. Expression levels of the various gene clusters in cell types defined by single cell RNA sequencing confirm the assignments (public data processed: PRJNA276084 ( 16)).Note that mature epidermal cells that are enriched in the epidermal isolation are missing from the single cell datasets due to technical incompatibility.E. Average profile plot and heatmaps of H3K4me3 signal over the various gene clusters from ChIPseq (reference-adjusted reads per million: RRPM) and CUT&Tag (RPKM) show comparable distribution of reads.ChIPseq and CUT&Tag shown are from isolated neoblasts (ChIPseq public data reprocessed GSE74169 (18)).FG.Location of the H3K4me3 CUT&Tag peaks (F) and ATACseq peaks (G) detected in each cell isolation.The promoter is defined as the 2kb upstream of the TSS.H. Average profile plot of intestine ATACseq (in RPKM) over weakly expressed genes (1 to 10 TPM), non-expressed genes (< 1 TPM), genes with ATACseq peaks in introns, and intestine-specific genes.I. Average profile plot of epidermis ATACseq (in RPKM) over weakly expressed gene (1 to 10 TPM), non-expressed genes (< 1 TPM), genes with ATACseq peaks in intron and epidermis-specific genes.JKLM.Gene size (J), exon number (K), cDNA size (L), and intron size (average per transcript) (M) per gene cluster.Highly expressed genes tend to be smaller than low-expressed genes, except among neoblast-specific genes.The difference largely correlates with larger intron size and higher intron number in the low-expressed genes.   .Throughout all datasets the similarity of neoblast genes and constitutive genes is detected, whereas tissue-specific genes follow a different profile.In data from Gehrke and Li, the shift of the ATAC peak in neoblast genes and constitutive genes is confirmed: only differentiated tissue genes show a peak upstream of the TSS, whereas for neoblast genes and constitutive genes the peak is on top of the TSS. A. Re-analysis and UMAP of sequenced cells (PRJNA889328) (33) of hatchling worms recovers cluster identity similar to results obtained by the original authors, corresponding to identifiable cell types, including neoblasts.These clusters were used for the identification of tissue-specific gene sets.A gene was defined as tissue-specific if it was more than 2-fold enriched in one tissue compared to each of the other tissues.A gene was defined as constitutive if the enrichment for all the tissue comparisons was between 0.5 and 2-fold.Based on these criteria, we identified 373 intestine-specific genes, 1913 epidermis-specific genes, 1631 neuron-specific genes, 408 stem cell-specific genes, and 150 constitutive genes.B. Aggregated expression over the cell types of the various defined clusters of tissue-specific and constitutive genes defined in A. C. Median RPKM profile plots over the predicted TSS region (± 1kb) of tissue-specific and constitutive genes, based on ATACseq datasets of merged tail and head 0h after amputation were reanalyzed (PRJNA512373 (32).Whereas epidermal and intestinal genes have a clear peak in chromatin accessibility around the TSS, neoblast genes and constitutive genes lack this organization.D. Analysis of the AT content over the region surrounding the TSS (-/+ 500 bases around the TSS) of Hofstenia gene clusters.The bar plot represents the aggregated percentage of nucleotide content in each gene cluster.The other plots represent the nucleotide representation at each position over the region of interest.Similarly to the situation in S. mediterranea, neoblast-specific genes and constitutive genes have a higher T-content in the region upstream of the TSS.E. Quantification of T-stretches observed over the TSS in the different gene clusters.The sequences analyzed correspond to the 500 bases upstream of the TSS and are reverse complemented when genes are on reverse strand.Neoblast genes and constitutive genes have more stretches of 4 or more Ts in their promoter regions.Please note that due to limited data and absence of tissue-specific data the predictions of the TSS positions are suboptimal, and this increased the noise in this analysis.A. Enriched motifs in ATACseq peaks not localized in the promoter regions and assigned as putative enhancers of intestinal, epidermal or neoblast specific genes.Putative enhancer regions were assigned to the closest TSS.No significantly enriched motifs were found in putative enhancers in proximity to brain-specific or constitutive genes.B. Non-promoter ATAC peaks are primarily located in introns and intergenic ("unknown") regions.C. Average profile plots and heat maps of accessible regions outside of gene promoters.Shown is accessibility at the non-promoter peaks of either epidermis-specific genes (top group), intestine-specific genes (middle group), or neoblast-specific genes (bottom group), relative to the peak center.Epidermal and intestinal peaks are largely tissue-specific.However, the genes that have enhancer accessibility in neoblasts have relatively broad and weak peaks, and this accessibility tends to be present in other tissue isolations as well.10D).We used the TSS +1 base as the coordinate for each gene.B. Gene expression in TPM over A/B compartment (related to Supplemental figure 10E).C. Chromatin characteristics of A/B compartment (related to Figure 4B and Supplemental figure 10C).The data of ATAC, H3K4me3 and H3K9me3 of the various tissues were merged for this analysis, and the RPKM was computed over the 50kb bins of the A/B compartment annotation.D. Percentage of transposable element annotation per family over A/B compartment compared to the whole genome percentage (in gray) (related to Figure 4D).

Figure S1 :
FigureS1: Supporting data on the classification of tissue-specific and constitutive gene sets.A. Boxplots showing the expression (log2(TPM)) of the constitutive genes in the RNA-seq data from the different tissues (brain, epidermis, intestine, neoblast).B. Boxplots of the log2(ATAC_RPKM) over the TSS region (± 1kb) of the constitutive genes in the different cell isolations (brain, epidermis, intestine, neoblast).C. Boxplots of the log2(H3K4me3_RPKM) over the TSS region (± 1kb) of the constitutive genes in the different cell isolations (brain, epidermis, intestine, neoblast).D. Expression levels of the various gene clusters in cell types defined by single cell RNA sequencing confirm the assignments (public data processed: PRJNA276084 (16)).Note that mature epidermal cells that are enriched in the epidermal isolation are missing from the single cell datasets due to technical incompatibility.E. Average profile plot and heatmaps of H3K4me3 signal over the various gene clusters from ChIPseq (reference-adjusted reads per million: RRPM) and CUT&Tag (RPKM) show comparable distribution of reads.ChIPseq and CUT&Tag shown are from isolated neoblasts (ChIPseq public data reprocessed GSE74169 (18)).FG.Location of the H3K4me3 CUT&Tag peaks (F) and ATACseq peaks (G) detected in each cell isolation.The promoter is defined as the 2kb upstream of the TSS.H. Average profile plot of intestine ATACseq (in RPKM) over weakly expressed genes (1 to 10 TPM), non-expressed genes (< 1 TPM), genes with ATACseq peaks in introns, and intestine-specific genes.I. Average profile plot of epidermis ATACseq (in RPKM) over weakly expressed gene (1 to 10 TPM), non-expressed genes (< 1 TPM), genes with ATACseq peaks in intron and epidermis-specific genes.JKLM.Gene size (J), exon number (K), cDNA size (L), and intron size (average per transcript) (M) per gene cluster.Highly expressed genes tend to be smaller than low-expressed genes, except among neoblast-specific genes.The difference largely correlates with larger intron size and higher intron number in the low-expressed genes.

Figure S3 :
Figure S3: Profile plots of the TSS regions of the different gene clusters.AB.Shown are average RPKM profile plots over the TSS region (± 1kb) of ATACseq (left) and H3K4me3 (right) data from each of the cell isolations for the constitutive gene cluster (A, plot with gray background), and the intestine-specific gene cluster (B, plot with green background).Constitutive genes show a high H3K4me3 signal in the absence of a clear ATACseq peak in each of the isolations.The intestinal genes show an intestine-specific ATAC peak and increased H3K4me3 signal in the intestinal sample.CDEF.Shown are average RPKM profile plots over the TSS region (± 1kb) of ATACseq and H3K4me3 CUT&Tag data for each of the different gene clusters in brain (C), epidermis (D), intestine (E) and neoblast (F) isolations.Each of the differentiated tissues shows an ATAC peak of its tissue-specific genes.5

Figure S4 :Figure S5 :
Figure S4: Control experiments to exclude technical artifacts from ATACseq experiments.A. Comparison of the library insert size of epidermis treated with Hoechst versus the epidermis without Hoechst staining (Li et al (PRJNA633618) (30).Hoechst staining does not have a negative effect on library quality.B. Comparison of the library insert size of G1/G0 stem cells (X2) versus G2/M phase neoblast cells (X1).Cells in G2/M phase do not show a siginficantly altered profile compared to cells in G1 phase.C. Pearson correlation matrix of the ATACseq used in this analysis: epidermis, epidermis treated with Hoechst, intestine, brain, neoblast and G1/G0 cells, computed with Deeptools (82).DEFG.Average profile plots (in RPKM) over the TSS region (± 1kb) of epidermis with Hoechst (D), epidermis without Hoechst (E), X2 cells (F), and X1 cells (G).Again, no major effects of Hoechst staining or cell cycle state are detected.

Figure S6 :
FigureS6: Analysis of the promoter organization of tissue-specific and constitutive genes in the acoel Hofstenia miamia.A. Re-analysis and UMAP of sequenced cells (PRJNA889328) (33) of hatchling worms recovers cluster identity similar to results obtained by the original authors, corresponding to identifiable cell types, including neoblasts.These clusters were used for the identification of tissue-specific gene sets.A gene was defined as tissue-specific if it was more than 2-fold enriched in one tissue compared to each of the other tissues.A gene was defined as constitutive if the enrichment for all the tissue comparisons was between 0.5 and 2-fold.Based on these criteria, we identified 373 intestine-specific genes, 1913 epidermis-specific genes, 1631 neuron-specific genes, 408 stem cell-specific genes, and 150 constitutive genes.B. Aggregated expression over the cell types of the various defined clusters of tissue-specific and constitutive genes defined in A. C. Median RPKM profile plots over the predicted TSS region (± 1kb) of tissue-specific and constitutive genes, based on ATACseq datasets of merged tail and head 0h after amputation were reanalyzed (PRJNA512373 (32).Whereas epidermal and intestinal genes have a clear peak in chromatin accessibility around the TSS, neoblast genes and constitutive genes lack this organization.D. Analysis of the AT content over the region surrounding the TSS (-/+ 500 bases around the TSS) of Hofstenia gene clusters.The bar plot represents the aggregated percentage of nucleotide content in each gene cluster.The other plots represent the nucleotide representation at each position over the region of interest.Similarly to the situation in S. mediterranea, neoblast-specific genes and constitutive genes have a higher T-content in the region upstream of the TSS.E. Quantification of T-stretches observed over the TSS in the different gene clusters.The sequences analyzed correspond to the 500 bases upstream of the TSS and are reverse complemented when genes are on reverse strand.Neoblast genes and constitutive genes have more stretches of 4 or more Ts in their promoter regions.Please note that due to limited data and absence of tissue-specific data the predictions of the TSS positions are suboptimal, and this increased the noise in this analysis.

Figure S7 :
Figure S7: Knockdown of HNF4 affects intestinal gene expression.A. Expression levels of planarian transcription factors in each of the tissue isolations.Many of the transcription factors have high expression in the neoblasts.Intestinal transcription factors are enriched in the intestine.B. Metaplots (average RPKM) depicting chromatin accessibility over several predicted TF binding motifs of interest from panel A (± 500b) in each of the cell isolations.C. RNAi-mediated knockdown of the predicted intestinal transcription factor HNF4 indeed results in reduced expression of a subset of the known intestinal genes (shown in light green) and predicted intestinal HNF4 targets (dark green), with minimal effects on neoblast (yellow), constitutive (grey), and epidermal (blue) genes.Statistical significance is determined using a Student's t-test (p-value: * ≤0.05, ** ≤0.01, *** ≤0.005).10 Figure S8: Enhancer motif analysis.A. Enriched motifs in ATACseq peaks not localized in the promoter regions and assigned as putative enhancers of intestinal, epidermal or neoblast specific genes.Putative enhancer regions were assigned to the closest TSS.No significantly enriched motifs were found in putative enhancers in proximity to brain-specific or constitutive genes.B. Non-promoter ATAC peaks are primarily located in introns and intergenic ("unknown") regions.C. Average profile plots and heat maps of accessible regions outside of gene promoters.Shown is accessibility at the non-promoter peaks of either epidermis-specific genes (top group), intestine-specific genes (middle group), or neoblast-specific genes (bottom group), relative to the peak center.Epidermal and intestinal peaks are largely tissue-specific.However, the genes that have enhancer accessibility in neoblasts have relatively broad and weak peaks, and this accessibility tends to be present in other tissue isolations as well.11

Figure S9 :
Figure S9: Nucleotide stretches observed over the TSS in the different gene clusters.A. Frequency of stretches of A nucleotides.B. T nucleotides.C. C nucleotides.D. G nucleotides.The sequences analyzed correspond to the 500 bases upstream of the TSS and are reverse complemented when genes are on the reverse strand.A-and T-stretches are more abundant than C-or G-stretches.Only T-stretches are significantly enriched in neoblast and constitutive gene promoter regions.

Figure S10 :
Figure S10: Hi-C interaction maps of A/B compartments.A. Size distribution of the A and B compartment.B. Pie chart of the percentage of 50kb bins annotated as A, B and NA (not attributed) compartments.C. Comparison of the H3K9me3 content between the A and B compartment.D. Distribution of genes between the A and B compartment (weak genes: less than 10 TPM).E. Distribution of gene expression levels in the A and B compartments (weak genes: less than 10 TPM).13

Figure S12 :
Figure S12: Analysis of A/B compartment features per chromosome.A. Distribution of the genes in A and B compartments per chromosome (related to Supplemental figure10D).We used the TSS +1 base as the coordinate for each gene.B. Gene expression in TPM over A/B compartment (related to Supplemental figure10E).C. Chromatin characteristics of A/B compartment (related to Figure4Band Supplemental figure10C).The data of ATAC, H3K4me3 and H3K9me3 of the various tissues were merged for this analysis, and the RPKM was computed over the 50kb bins of the A/B compartment annotation.D. Percentage of transposable element annotation per family over A/B compartment compared to the whole genome percentage (in gray) (related to Figure4D).

Figure S13 :
Figure S13: Overview of planarian chromatin remodelers. A. Phylogenetic tree of the proteins with SNF2like domains (S. mediterranea proteins are indicated in bold).The color of the branches indicates the bootstrap score.Ce: Caenorhabditis elegans; Dr: Danio rerio; Dm: Drosophila melanogaster; Hs: Homo sapiens; Mm: Mus musculus; Sc: Saccharomyces cerevisiae; Sm: Schmidtea mediterranea.B. Heatmap of RNAseq-based expression levels of the genes related to SNF2.The color scale depicts the TPM, with blue representing the lowest expression and yellow representing the highest.C. Schematic of protein domains of the S. mediterranea chromatin remodeler orthologues made with Interproscan (97).

Table S4 .
Sequences of the primers used in this study.

Table S5 .
Number of genes per cluster and average TPM in the different tissues.

Table S6 . (separate file)
Complete listing of the genes that make up the various S. mediterranea gene clusters

Table S7 . (separate file)
Complete listing of the genes that make up the various H. miamia gene clusters